perm filename CNTOUR[0,BGB] blob sn#117920 filedate 1974-08-27 generic text, type C, neo UTF8
COMMENT ⊗   VALID 00011 PAGES
C REC  PAGE   DESCRIPTION
C00001 00001
C00002 00002	{⊂C<NαIMAGE CONTOURING.λ30P82I425,0JCFA} SECTION 7.
C00005 00003
C00010 00004	⊂7.1	CRE - An Image Processing Sub-System.⊃
C00017 00005
C00021 00006	⊂7.2	Thresholding.⊃
C00025 00007	⊂7.3	Contouring.⊃
C00030 00008	⊂7.4	Polygon Nesting.⊃
C00034 00009
C00037 00010	⊂7.5	Contour Segmentation.⊃
C00041 00011	⊂7.6	Related and Future Image Analysis.⊃
C00044 ENDMK
C⊗;
{⊂C;<N;αIMAGE CONTOURING.;λ30;P82;I425,0;JCFA} SECTION 7.
{JCFD}    VIDEO IMAGE CONTOURING.
{λ10;W250;JAFA}
	7.0	Introduction to Image Analysis.
	7.1	CRE - An Image Processing System.
	7.2	Thresholding.
	7.3	Contouring.
	7.4	Polygon Nesting.
	7.5	Contour Segmentation.
	7.6	Related and Future Image Analysis.
{λ30;W0;I900,0;JU;FA}
⊂7.0	Introduction to Image Analysis.⊃

	Simply put, image analysis is the inverse of image synthesis.
From this point  of view, the usually difficult question of "analysis
into what ?"  is answered by  the answer  to the question  "synthesis
from what ?". Since a 3-D geometric model is adequate (and necessary)
for  synthesizing digital  television pictures,  it is  reasonable to
suppose that such a model  is an appropriate subgoal in  the analysis
of  television pictures.  Such  an analysis  into a  3-D  model would
provide  a   useful  data   reduction  as   well  as   a   convenient
representation for  solving robotics  problems such as  manipulation,
navigation  and  recognition.   This  approach to  image  analysis is
somewhat heretical,  the orthodox method is to extract  features from
2-D images,   which features are  then used directly for  the desired
task. On  the other hand,  vision by inverse computer graphics may be
viewed  as  an extreme  form  of  feature  finding,    involving  the
extraction of a set of basic geometric features which are combined to
form a superfeature,   a 3-D  model.  The  rest of this  introduction
enumerates some of the  kinds of information available in  a sequence
of images  and some of the kinds  of data structures for representing
images. An image is a 2-D data structure representing the contents of
a  rectangle from  the pattern  of  light formed  by a  thin lens;  a
sequence of images in time is called a film.

	Three basic kinds of information in  an image are photometric
information,   geometric  information,  and  topological information.
Fundamentally, geometry concerns distance  measure.  The geometry  of
an image  is based on coordinate  pairs that are associated  with the
elements  that form the  image.  From the  coordinates such geometric
properties as  length,   area,  angle  and moments  can be  computed.
Photometry  means light measure,   although physical  measurements of
light may include power,   hue,  saturation, polarization and  phase;
only the relative  power between points  of the same image  is easily
available  to a  computer using  a television  camera. The  taking of
color images is  possible at Stanford by  means of filters;  however,
the acquisition  of color is  inconvenient and has not  been seriously
pursued  in  the present  work.  Finally,   topology has  to  do with
neighborhoods,   what  is  next  to  what; topological  data  may  be
explicitly  represented by  pointers  between related  entities,   or
implicitly represented by the format of the data.

	Three basic  kinds of image  data structures are  the raster,
the contour  map and the mosaic. A raster  image is a two dimensional
integer valued  array of pixels;  a pixel  "picture element",   is  a
single sample  position on a vidicon.   Although the real  shape of a
pixel  is probably that of  a blunt ellipse;  the fiction that pixels
tesselate the image into little rectangles will be adopted. For other
theoretical purposes  the array is  assumed to be  formed by sampling
and truncating a  two dimensional,  smooth, infinitely  differentiable
real valued function. A contour image is like a geodesic contour map,
no  two contours  ever cross and  all the  contours close.   A mosaic
image (or tesselation) is like a ceramic tile mosaic, no  two regions
ever overlap  and the whole  image is completely covered  with tiles.
Further  useful restrictions might  be made concerning  whether it is
permitted to  have  tiles with  holes  surrounding smaller  tiles  or
whether it  is permitted for a  tile to have points  that are thinner
than a single pixel.

	Given a raster image,  the usual visual analysis approach  is
to find  the  features.   One canonical  geometric  image feature  is
called the <edge> and the places where edges are not found are called
<regions>. For a naive start,  an  edge can be defined as a locus  of
change in  the image function.   Edges and regions  are complementary
sides  of the same slippery concept;  the concept is slippery because
although a continuous function of two variables and a  graph of edges
are each  well known mathematical objects the  conversion of one into
the other is a poorly understood process that depends greatly on ones
motives and resources.  A computational definition of the region/edge
notion   would   include  a   procedure  for   converting   a  raster
approximation of an image function into a  region/edge representation
based on parameters which are conceptually elegant.

⊂7.1	CRE - An Image Processing Sub-System.⊃

	The acronym CRE stands for "Contour,   Region,  Edge". CRE is
a solution  to the problem of finding contour  edges in a sequence of
television pictures and of  linking corresponding edges and  polygons
from  one picture  to the  next.   The  process is  automatic and  is
intended to run without human intervention. Furthermore,  the process
is bottom up; there are no inputs that anticipate  the content of the
given television images.  The output of CRE is a 2-D contour map data
structure which  is  suitable  input to  the  3-D  modeling  program,
GEOMED. Five design choices  that determine the character of  CRE are
listed  in Box  7.1. The  design  choices are  ordered from  the more
strategic  to the  more  tactical;  the  first  three  choices  being
research  strategies,  the  latter   two  choices  being  programming
tactics.  Adopting these design choices lead  to image contouring and
contour map structures similar to those of Krakauer (71) and Zahn (66).
{|;JA;λ10;}
BOX 7.1 {JC} CRE DESIGN CHOICES
		1. Dumb vision rather than model driven vision.
		2. Multi image analysis rather than single image analysis.
		3. Total image structure imposed on edge finding; rather
		   than separate edge finder and image analyzer.
		4. Automatic rather than interactive.
		5. Machine language rather than higher level language.
{|;λ30;JUFA}
	The first design  choice does not  refer to the issue  of how
model dependent a  finished general vision system will be (it will be
quite model dependent),   but rather to the  issue of how one  should
begin  building such  a system. The best  starting
points are  at the two apparent extremes of nearly total knowledge of
a particular  visual  world or  nearly total  ignorance.   The  first
extreme involves  synthesis (by computer graphics) of  a predicted 2-D
image, followed by comparing the predicted and a perceived image  for
slight differences  which are  expected but  not yet  measured.   The
second  extreme involves  analyzing perceived images  into structures
which can  be readily  compared for  near equality  and measured  for
slight differences;  followed by the  construction of a  3-D geometric
model of  the perceived world. The point is that in both cases images
are compared,  and in both cases the 3-D  model initially (or finally)
contains specific  numerical data on the geometry  and physics of the
particular world being looked at.

	The second design choice, of multi image analysis rather than
single  image analysis,    provides a  basis for  solving  for camera
positions and feature  depths.   The third design  choice solves  (or
rather avoids)  the problem of  integrating an edge  finder's results
into  an image. By using a very  simple edge finder, and by accepting
all the edges found, the image structure is never  lost.  This design
postpones the  problem of interpreting photometric  edges as physical
edges. The fourth choice is a resolution to write an image  processor
that does not require operator assistance or manual parameter tuning.
The final design choice of using machine language was for the sake of
implementing node link data structures that are processed one hundred
times faster than LEAP, ten times  faster than compiled LISP and that
require  significantly less memory than  similar structures in either
LISP or  LEAP. Furthermore  machine code assembles  and loads  faster
than  higher level  languages; and  machine code  can  be extensively
fixed and altered without recompiling.

	It is my impression that CRE itself does not raise any really
new scientific problems; nor does it have any really new solutions to
the old  problems; rather CRE is another  competent video region edge
finding program with its own set  of tricks.  However, it is  further
my impression  that the particular  tricks for nesting  and comparing
polygons in CRE are original programming techniques. As a part of the
larger feedback  system,   CRE  is  a necessary,   but  not  entirely
satisfactory implementation of pure bottom up image analysis.

	CRE  consists  of  five  steps:  thresholding,    contouring,
nesting,    smoothing and  comparing.   Thresholding,  contouring and
smoothing perform conversions between two different kinds  of images.
Nesting  and contouring  compute topological  relationships within  a
given  image  representation.  In summary  the  major  operations and
operands are as listed in Box 7.2; the VIC-Images are Video Intesity
Contour Images and the ARC-images are contours that have been smoothed.

{|;T300,600,850;JAFA;λ10;}
BOX 7.2 {JC} CRE DATA TRANSFORMATIONS.
	~MAJOR OPERATION~	~OPERAND~	~RESULT~.
	1. THRESHOLDING: 	6-BIT-IMAGE,	1-BIT-IMAGES.
	2. CONTOURING:	1-BIT-IMAGES,	VIC-IMAGE.
	3. NESTING:	VIC-IMAGE,	NESTED-VIC-IMAGE.
	4. SMOOTHING:	VIC-IMAGE,	ARC-IMAGE.
	5. COMPARING:	IMAGE & FILM,	FILM.
{|;T-1;JU;λ30;}
	The initial  operand is a  6-bit video  raster, which in  the
present  implementation is coerced  into a window  of 216  row by 288
columns; intermediate operands  consist of  1-bit rasters named  PAC,
VSEG and HSEG which are explained below, as well as a raster of links
named  SKY which is  used to  perform the polygon  nesting. The magic
window size 216 by 288 was derive by considering  the largest product
of powers of  two and three that would fit  within a video image. The
final result is a  node/link structure composed  of several kinds  of
nodes: vectors, arcs, polygons, lamtens (lamina inertia tensors)
levels,  images and the film node.

	Although  the natural order of  operations is sequential from
image thresholding to image comparing;  in order to keep memory  size
down, the first four steps are  applied one intensity level at a time
from  the darkest cut  to the  lightest cut (only  nesting depends on
this sequential cut order); and comparing is applied to whole images.
Figure 7.1  illustrates an initial video image  and its corresponding
contour image.   The contoured image  consists of thirteen  intensity
levels and took 45 seconds to compute (on  a PDP-10, two microsecond memory).
The final CRE data structure was composed of 1996 nodes.

⊂7.2	Thresholding.⊃

	Thresholding, the first and easiest step of CRE,  consists of
two subroutines,   called THRESH and PACXOR.  THRESH converts a 6-bit
image into a 1-bit image with respect to a given threshold  cut level
between zero for black and sixty-three for light. All pixels equal to
or greater than the cut, map into a one; all the pixels less than the
cut, map  into zero. The  resulting 1-bit  image is  stored in a  bit
array  of 216 rows  by 288 columns  (1728 words,   36 bits  per word)
called the PAC  (picture accumulator)  which was named  in memory  of
McCormick's ILLIAC-III.   After THRESH,   the  PAC contains blobs  of
bits.   A blob is defined as "rook's  move"  connected; that is
every bit of a  blob can be reached  by horizontal or vertical  moves
from any other  bit without having to  cross a zero bit  or having to
make a diagonal (bishop's) move.  Blobs may of course have holes.  Or
equivalently a blob always has one outer perimeter polygon,  and may
have one, several or no  inner perimeter polygons. This blob and hole
topology is recoverable from the CRE  data structure and is built  by
the nesting step.

{π}	Next,   PACXOR copies the  PAC into  two slightly larger  bit
arrays named HSEG and  VSEG. Then the PAC is shifted down one row and
exclusive ORed into the HSEG array; and the PAC is shifted right one
column  and  exclusive ORed  into  the  VSEG  array to  compute  the
horizontal and  vertical border bits of the PAC blobs.  Notice,  that
technically this  is the  very heart  of  the edge  finder of  CRE;
namely,   PACXOR is the  mechanism that converts  regions into edges.
Edge tracing is the only operation CRE performs on the 1-bit rasters;
although Boolean  image processing has caught the  eye of many vision
programmers (perhaps because  it resembles an  array automata or  the
game Life) one rapidly discovers that raster operations alone are too
weak  to do  anything interesting that  can't already  be done better
analytically in  a  raster of  numbers  or  topologically in  a
node/link data structure.

⊂7.3	Contouring.⊃

	Contouring,   converts  the  bit arrays  HSEG  and VSEG  into
vectors  and polygons.  The  contouring itself,  is  done by a single
subroutine named MKPGON,  make polygon.   When MKPGON is called,   it
looks for the upper most left non-zero  bit in the VSEG array. If the
VSEG  array is empty, MKPGON returns a  NIL. However, when the bit is
found, MKPGON traces and  erases the polygonal outline to  which that
bit belongs  and returns a polygon  node with a ring  of vectors. The
MKPGON trace can go in four directions: north and south along vertical
columns of bits in the VSEG array, or  east and west along horizontal
rows  of the HSEG array.  The trace starts by  heading south until it
hits a turn; while heading south MKPGON must check for first a turn to
the east  (indicated by a  bit in HSEG);  next for no  turn (continue
south);  and last for a turn to the  west. When a turn is encountered
MKPGON creates a vector node representing the run of bits between the
previous turn  and the present turn.   The trace  always ends heading
west bound.  The outline so traced  can be either the edge of a  blob
or  a  hole, the  two  cases  are  distinguished by  looking  at  the
VIC-polygon's uppermost left pixel in the PAC bit array.

	There   are  two  complexities:   contrast  accumulation  and
dekinking.   The  contrast  of  a  vector  is  defined  as  (QUOTIENT
(DIFFERENCE (Sum  of pixel values on  one side of the  vector)(Sum of
pixel values  on the other side of the vector)) (length of the vector
in pixels)).  Since vectors are always either  horizontal or vertical
and  are  construed  as  being  on  the cracks  between  pixels;  the
specified summations refer to the  pixels immediately to either  side
of the vector. Notice  that this definition of contrast  will always
give a positive contrast for vectors of a blob and negative contrast
for the vectors of a hole.

	The  contours  that  have  just  been  traced   would  appear
"sawtoothed" or "kinky"; the terms "kink", "sawtooth" and "jaggy" are
used  to express  what  seems to  be wrong  about the  lowermost left
polygon in  Figure 7.2.  The problem  involves doing  something to  a
rectilinear quantized set of segments,  to make its continuous nature
more evident. In CRE, the jaggies solution (in the subroutine MKPGON)
merely positions  the  turning locus  diagonally off  its grid  point
a little  in  the  direction (northeast,    northwest,   southwest  or
southeast) that  bisects the  turn's right  angle.   The distance  of
dekink  vernier positioning  is always  less than  half a  pixel; but
greater  for brighter cuts and less for  the darker cuts; in order to
preserve the nesting of contours.   The sawtoothed and  the dekinked
versions of  a polygon have  the same number  of vectors.   I am very
fond  of  this   dekinking  algorithm  because   of  its   incredible
efficiency; given that you have a north,  south,  east,  west polygon
trace  routine (which  handles image  coordinates packed  row, column
into one word); then dekinking requires only one more ADD instruction
execution per vector !

⊂7.4	Polygon Nesting.⊃

	The nesting problem is to decide  whether one contour polygon
is  within another.  Although  easy in the two  polygon case; solving
the nesting  of many  polygons  with respect  to each  other  becomes
n-squared expensive in  either compute time or in memory  space.  The
nesting  solution in  CRE sacrifices memory  for the  sake of greater
speed and requires a 31K array, called the SKY. CRE's accumulation of
a properly nested tree of  polygons depends on the order of threshold
cutting going from  dark to  light. For  each polygon  there are  two
nesting steps:  first, the polygon  is placed in  the tree  of nested
polygons by the  subroutine INTREE; second,  the polygon is placed in
the SKY array by the subroutine named INSKY.

{π}	The SKY array is 216 rows of 289 columns of  18-bit pointers.
The name  "SKY" came about because,   the array use  to represent the
farthest  away regions or  background, which  in the case  of a robot
vehicle is the real sky  blue. The sky contains vector  pointers; and
would  be more  efficient  on a  virtual memory  machine  that didn't
allocate unused pages of its  address space.  Whereas most  computers
have more memory containers than address space; computer graphics and
vision might be easier to program in a memory with more address space
than physical space; i.e. an almost empty virtual memory.

	The first part of the INTREE routine finds the  surrounder of
a given polygon by scanning the  SKY due east from the uppermost left
pixel  of the  given polygon.   The  SON of  a polygon is  always its
uppermost  left vector.  After  INTREE,   the  INSKY  routine  places
pointers to  the vertical vectors of  the given polygon  into the sky
array. The second part of the INTREE routine checks for and fixes  up
the case  where the new  polygon captures a  polygon that  is already
enclaved. This only happens when two or more levels of the image have
blobs that  have  holes.   The  next  paragraph explains  the  arcane
details of fixing up the tree links of multi level hole polygons; and
may  be skipped by everyone  but those who might  wish to implement a
polygon nester.

	Let the given polygon  be named Poly; and let  the surrounder
of Poly be  called Exopoly; and assume that Exopoly surrounds several
enclaved polygons called  "endo's", which are  already in the  nested
polygon tree. Also, there are two  kinds of temporary lists named the
PLIST and  the NLIST. There is one PLIST which is initially a list of
all the ENDO polygons on  Exopoly's ENDO ring. Each endo in  turn has
an NLIST  which is initially  empty.  The  subroutine INTREE re-scans
the sky array for the polygon due east (to the left) of the uppermost
left vector of each endo polygon on the PLIST, (Exopoly's ENDO ring).
On  such re-scanning,  (on behalf  of say an  Endo1), there  are four
cases: <No  change>;  the  scan  returns Exopoly;  which  is  Endo1's
original EXO. <Poly captures Endo1>; the scan returns Poly indicating
that  endo1 has been captured  by Poly. <My  brothers fate>; the scan
hits an endo2 which is not on the PLIST; which means that endo2's EXO
is valid and  is the valid EXO of endo1.  <My fate delayed>; the scan
hits an endo2 which is still  on the PLIST; which means that  endo2's
EXO is not yet valid but when discovered it will also be Endo1's EXO;
so Endo1 is CONSed into Endo2's NLIST. When an endo polygon's EXO has
been rediscovered, then  all the  polygons on that  endo's NLIST  are
also placed  into the polygon  tree at that  place. All of  this link
crunching  machinery takes half a page of  code and is not frequently
executed.

⊂7.5	Contour Segmentation.⊃

	In  CRE  the  term <segmenting>  refers  to  the  problem  of
breaking a 1-D manifold (a polygon) into simple functions (arcs). The
segmenting step,  converts the  polygons of  vertical and  horizontal
vectors into polygons of arcs.   For the present the term "arc" means
"linear  arc" which  is a  line segment.  Fancier arcs:  circular and
cubic spline were implemented and thrown out mostly because they were
of no use to higher processes such as the polygon compare which would
have to  break the  fancy  arcs back  down  into linear  vectors  for
computing areas, inertia tensors or mere display buffers.

	Segmenting is applied to each polygon of  a level.  To start,
a ring of two arcs is formed (a bi-gon) with one arc at the uppermost
left and  the  other  at the  lowermost  right of  the  given  vector
polygon. Next a recursive make arc  operation, MKARC, is appled to
the two initial arcs. Since the arc given to MKARC is in a one to one
correspondence with a doubly  linked list of vectors; MKARC  checks to
see whether each point on  the list of vectors is close enough to the
approximating arc.  MKARC returns the  given arc as good enough  when
all the sub vectors fall within a given width; otherwise MKARC splits
the arc in two  and places a new arc vertex on the vector vertex that
was farthest away from the original arc.

	The two large  images in  Figure 7.3,   illustrate a  polygon
smoothed with  arc width  tolerances set at  two different  widths in
order  to show  one  recursion of  MKARC.   The eight  smaller images
illustrate the  results of  setting the  arc width  tolerance over  a
range of  values. Because of the dekinking  mentioned earlier the arc
width tolerance can  be equal  to or less  than one  pixel and  still
yield a substantial  reduction in the  number of vectors it  takes to
describe a contour polygon.

	A final  important detail is that the  arc width tolerance is
actually taken as  a function  of the highest  contrast vector  found
along the  arc; so  that high  contrast arcs  are smoothed  with much
smaller  arc  width  tolerances than  are  low  contrast  arcs. After
smoothing, the contrast across each  arc is computed and the  ring of
arcs  replaces the ring  of vectors  of the given  polygon. (Polygons
that would be expressed as only two arcs are deleted).

⊂7.6	Related and Future Image Analysis.⊃

{π}	In general,   robotic image analysis should consist  of three
steps:  first, high quality  pictures are taken  continuously in time
and  space; second,  several  low  level  bulk  operations  (such  as
correlation, filtering,  histogramming and thresholding)  are applied
to  each  image and  to  pairs of  images;  third,   the  rasters are
converted into  linked 2-D  structures which  are further  amalgamated
into  connected 3-D  models.   It  is  clear to  me  that my  present
implementation only has fragile toy  routines where rugged tools  are
needed. Eventually, more kinds of image  features and larger coherent
structures must be  included.  In particular, the contour maps should
be bundled into  regional mosaics and more  features should be  woven
into the node/link structure.

	Contour image  processing is effectively surveyed  by Freeman
(74)  who gives the erroneous impression  that contour images are the
best  image   representation  (rasters   and   mosaics  are   equally
important).  Contours are  applied to  recognition of  silhouettes by
Dudani (70)  using moments  similar to  those explained  in the  next
chapter.  Finally,    my own  acquaintance  with  the  contour  image
representation  was initially  derived from papers  by Zahn  (66) and
Krakauer (71).